Skip to content

Suppress spurious "Failed to detach context" log in OTel interceptor#1596

Open
brianstrauch wants to merge 4 commits into
mainfrom
fix-otel-detach-context-log
Open

Suppress spurious "Failed to detach context" log in OTel interceptor#1596
brianstrauch wants to merge 4 commits into
mainfrom
fix-otel-detach-context-log

Conversation

@brianstrauch

@brianstrauch brianstrauch commented Jun 12, 2026

Copy link
Copy Markdown
Member

What

Flaky test observed here: https://github.com/temporalio/sdk-python/actions/runs/27432588766/job/81086412642?pr=1378

TracingWorkflowInboundInterceptor attaches an OTel context at the start of execute_workflow/handle_query and detaches it in the finally. The workflow event loop runs portions of the workflow inside contextvars.copy_context().run(...) (e.g. wait_condition), so that finally can execute in a copied contextvars.Context. A copy preserves the OTel context value — so the existing context is get_current() guard passes — but invalidates the attach token, so opentelemetry.context.detach fails internally and logs "Failed to detach context".

Because LogCapturer attaches a handler to the process-global opentelemetry.context logger, that stray ERROR log bled into test_opentelemetry_safe_detach whenever a tracing workflow tore down during its capture window (same xdist worker process), making the test flaky. In production it's recurring ERROR-level noise on affected workflow completions.

Fix

Route the best-effort detach through a small _safe_detach helper that drops only the "Failed to detach context" record via a scoped logging.Filter.

It still calls opentelemetry.context.detach, so attach/detach call counts stay balanced — the leak invariant enforced by test_opentelemetry_context_restored_after_activity. (Alternatives that avoid the public detach — restoring via attach(previous), or resetting the runtime contextvar directly — both break that balance test, so suppressing at the logger is the only approach that satisfies balanced calls + no spurious log + no test weakening.)

Caveat

During a detach the filter is attached to the process-global OTel context logger, so it would also drop an unrelated "Failed to detach context" emitted by other code in that brief window. Given the alternatives break a correctness invariant, this is the right trade-off.

Testing

  • tests/contrib/opentelemetry/ and tests/contrib/openai_agents/test_openai_tracing.py pass together (27/27), including the attach/detach balance test.
  • Verified directly that the cross-context detach logs "Failed to detach context" via raw detach but produces no record through _safe_detach.
  • Lint clean (ruff, format, pyright, mypy, basedpyright, pydocstyle).

The tracing workflow interceptor attaches an OTel context at the start of
execute_workflow/handle_query and detaches it in the finally. The workflow
event loop runs portions of the workflow inside
contextvars.copy_context().run(...), so the finally can execute in a copied
contextvars.Context. A copy preserves the OTel context value (so the
existing `context is get_current()` guard passes) but invalidates the attach
token, so opentelemetry.context.detach logs "Failed to detach context".

Because LogCapturer attaches to the process-global opentelemetry.context
logger, that stray log also bled into test_opentelemetry_safe_detach when a
tracing workflow tore down during its capture window, making it flaky.

Route the best-effort detach through _safe_detach, which drops only that one
log record via a scoped logging filter. It still calls
opentelemetry.context.detach so attach/detach calls stay balanced, satisfying
the leak invariant enforced by test_opentelemetry_context_restored_after_activity.
@brianstrauch brianstrauch requested a review from a team as a code owner June 12, 2026 22:45
# token.
if context is opentelemetry.context.get_current():
opentelemetry.context.detach(token)
_safe_detach(token)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is really a good change. Doing a bunch of extra custom actions to remove the log is pretty wacky, and the context actually cannot be different from the one it was created on, see the line directly above which was added for this purpose.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the context actually cannot be different from the one it was created on

The term "context" is overloaded here: The OTel context is copied by value and doesn't change, but the runtime context does, which results in the log message. I agree this is not the greatest fix, so let me try pushing a fix for the root issue. I couldn't get it working the first time I tried, but the idea is that we could restore context by reattaching the previous context instead of detaching the attach token.

… attach

The tracing interceptor attached an OTel context and detached it via
opentelemetry.context.detach(token) in a finally. detach calls
ContextVar.reset(token), which raises (and OTel logs "Failed to detach
context") when the finally runs in a different contextvars.Context than the
attach -- e.g. when the workflow event loop resumes inside
contextvars.copy_context().run(...). A copy preserves the OTel context value
(so the `context is get_current()` guard passes) but invalidates the token.

Restore the previously-current context by re-attaching it instead of detaching
the token. attach goes through ContextVar.set, which is valid in any
contextvars.Context, so the failing reset is never reached and there is no
spurious log to suppress. This replaces the _safe_detach logging-filter
workaround entirely.

Rework the leak test to model OTel's current-context ContextVar and assert it
is restored to baseline, rather than counting attach/detach calls (which no
longer balance once restoration goes through attach). Remove
test_opentelemetry_safe_detach, which only exercised the deleted suppression.
context.attach's parameter is named `context`; the leak test's tracked_attach
wrapper named it `ctx`, which mypy rejects when assigning the wrapper to
context.attach (parameter names are part of a callable's type). Match the name
and annotate with the imported Context type.
@brianstrauch brianstrauch requested a review from tconley1428 June 16, 2026 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants